AudioX is a unified diffusion transformer model capable of generating audio and music from arbitrary content. It produces high-quality general audio and musical compositions, offers flexible natural language control, and seamlessly handles multimodal inputs.